Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

168

Applications in Computer Vision

Conv.

PReLU

+1, +1, +1

-1, -1, -1

ߚଵଵ

௢భ, ߚଵଶ

௢భ, ߚଵଷ

௢భ

ߚଶଵ

௢భ, ߚଶଶ

௢భ, ߚଶଷ

௢భ

ߚଷଵ

௢భ, ߚଷଶ

௢భ, ߚଷଷ

௢భ

ܮ௜

஺௡௚

PReLU

ܮ௜

஺௠௣

Differentiable Binarization Search

Learning scale factor

ߚଵଵ

௢మ, ߚଵଶ

௢మ, ߚଵଷ

୭మ

ߚଶଵ

௢మ, ߚଶଶ

௢మ, ߚଶଷ

௢మ

ߚଷଵ

௢మ, ߚଷଶ

௢మ, ߚଷଷ

௢మ

ࢻ௜

෥ܟ௜

܉௜ିଵ

ܟ௜

ො܉௜ିଵ

ෝܟ௜

ି

ෝܟ௜

ା

ߚ௜

௢భ

ߚ௜

௢మ

Real-valued Teacher

1-bit Student

FIGURE 6.10

Our LWS-Det. From left to right are the input, search, and learning processes. For a given 1-

bit convolution layer, LWS-Det ﬁrst searches for the binary weight (+1 or −1) by minimizing

the angular loss supervised by a real-valued teacher detector. LWS-Det learns the real-valued

scale factor α to enhance the feature representation ability.

where ⊗is the convolution operation. We omit the batch normalization (BN) and activation

layers for simplicity. The 1-bit model aims to quantize wi and ai into wi ∈{−1, +1}

and ai ∈{−1, +1} using eﬃcient xnor and bit-count operations to replace full-precision

operations. Following [99], the forward process of the 1-bit CNN is:

ai = sign(ai−1 ⊙wi),

(6.66)

where ⊙represents the xnor and bit-count operations and sign(·) denotes the sign function,

which returns 1 if the input is greater than zero and −1 otherwise. This binarization process

will bring about the binarization error, which can be seen in Figs. 6.11 (a) and (b). The

product of the 1-bit convolution (b) cannot simulate the one of real value (a) both in

angularity and in amplitude.

Substantial eﬀorts have been made to optimize this error. [199, 228] formulate the object

L^w

i ⁼^∥^wⁱ ⁻^αⁱ ^◦^wⁱ^∥²

2^,

(6.67)

where ◦denotes the channel-wise multiplication and αi is the vector consisting of channel-

wise scale factors. Figure 6.11 (c) [199, 228] learns αi by directing optimizing L^w

i ^{to 0, and}

thus the explicit solution is

α^j

i ⁼

∥w^j

i ^∥¹

Ci−1 · K^j

i ^·^K^j

(6.68)

where j denotes the j-th channel of i-th layer. Other works [77] dynamically evaluate Eq.

6.80 rather than explicitly solving or modifying αi to other shapes [26].

Previous work mainly focuses on kernel reconstruction but neglects angular information,

as shown in Fig. 6.11 (d). One drawback of existing methods lies in its ineﬀectiveness when

binarizing a very small ﬂoat value as shown in Fig. 6.11. On the contrary, we leverage

the strong capacity of a diﬀerentiable search to fully explore a binary space for an ideal

combination of −1 and +1 without a ambiguous binarization process involved.

6.4.2

Formulation of LWS-Det

We regard the 1-bit object detector as a student network, which can be searched and learned

based on a teacher network (real-valued detector) layer by layer. Our overall framework is